Spatio-Temporal Credit Assignment in Neuronal Population Learning

نویسندگان

  • Johannes Friedrich
  • Robert Urbanczik
  • Walter Senn
چکیده

In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement learning for multi-step problems

In reinforcement learning for multi-step problems, the sparse nature of the feedback aggravates the difficulty of learning to perform. This paper explores the use of a reinforcement learning architecture, leading to a discussion of reinforcement learning in terms of feature abstraction, credit-assignment, and temporal-difference learning. Issues discussed include: the conditioning of the reinfo...

متن کامل

Neural Correlates of Temporal Credit Assignment

When feedback follows a sequence of decisions, how do people assign credit to intermediate actions within the sequence? To explore this temporal credit assignment problem, we recorded event-related potentials (ERPs) as participants performed a sequential decision task. Our ERP analyses focused on feedback-related negativity (FRN), a component thought to reflect neural reward prediction error. T...

متن کامل

Neuronal activity in dorsomedial and dorsolateral striatum under the requirement for temporal credit assignment

To investigate neural processes underlying temporal credit assignment in the striatum, we recorded neuronal activity in the dorsomedial and dorsolateral striatum (DMS and DLS, respectively) of rats performing a dynamic foraging task in which a choice has to be remembered until its outcome is revealed for correct credit assignment. Choice signals appeared sequentially, initially in the DMS and t...

متن کامل

QUICR-Learning for Multi-Agent Coordination

Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit must be assigned for an action taken at time step t that results in a reward at time step t′ > t. Second, credit must be assigned for the contribution of agent i to the overall system performance. The first credit assig...

متن کامل

Quicker Q-Learning in Multi-Agent Systems

Multi-agent learning in Markov Decisions ProbK i s chanenging because of the presence ot two credit assignment problems: 1) How to credit an action taken at time step t for rewards received at t’ > t ; and 2 ) How to credit an action taken by agent z considering the system reward is a function of the actions of all the agents. The first credit assignment problem is typically addressed with temp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2011